-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce default from 8 to 1 TVU receive socket/threads #998
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #998 +/- ##
=======================================
Coverage 81.9% 81.9%
=======================================
Files 855 855
Lines 232154 232154
=======================================
+ Hits 190170 190172 +2
+ Misses 41984 41982 -2 |
We probably should rename these (or add some comments) because it is pretty confusing which of the many tvu sockets this is referring to. But it looks like this is the shred-fetch-stage sockets (i.e. multiple sockets all bound to It seems like the motivation to use multiple sockets (for the same port) is explained here: solana-labs#1228 and in particular to address this issue: solana-labs#1224 So performance in steady state aside, the issue is under heavy load socket queues are saturated and packets are dropped. This commit effectively reverts #1228. Wouldn't that be a concern that socket queue saturation and issues like #1224 arise again? |
I'm open to a wider rename to make things more specific; I opted to match existing nomenclature: agave/gossip/src/cluster_info.rs Lines 3020 to 3022 in f5db8d2
Yep, that's right |
Yeah, totally agree that we need to provision things to be able to withstand worst-case / heavy load scenarios. A few points:
|
so how can we ensure this change does not resurface previous issues?
sys-tuner used to do that: But I do not know if the mulit-socket trick was added before above instructions used or they were still needed in addition to those instructions. |
Good callout, some further reading has me convinced this is the right knob:
And FWIW, tuning those settings manually is effectively "required": agave/core/src/system_monitor_service.rs Lines 403 to 408 in 550f806
Someone can opt out with Lines 1749 to 1750 in 550f806
Let me see if I can do some digging to figure this out. However, I think my earlier comment is very important to note as well:
Drops mean that packets were coming in faster than the validator could consume them. Even with a very large buffer, we theoretically would expect some drops on a long enough time scale. More threads were added to allow a higher overall consumption rate. But, using
Supposing we run a test like the one Brennan mentioned where we have blocks that contain |
yeah, I think we just need to stress test this with some massive number of shreds at tvu port. |
Cool, think we're in agreement; will report back with some data once I get the chance to setup and run this experiment |
A single streamer is capable of handling the load, and doing so with a single thread is more efficient.
cf6c65f
to
368f1a0
Compare
Problem
We currently create 8 sockets/threads to pull out turbine shreds. Using 8 of these sockets is excessive, and actually leads to more inefficiencies (ie less densely packed
PacketBatch
) as some data below will show. We have a 1ms coalesce duration as part of the receiver, but given that we have 8 receivers all running independently, this 1ms coalesce doesn't mean much since they're all competing for packets. Part of work for #35Summary of Changes
Reduce the default value of sockets/threads from 8 to 1.
One potential concern I could see for reducing the number of receivers would be if network load increased and over-burdened the single streamer. However,
bench-streamer
would indicate that the streamer can handle MUCH higher loads than what we're seeingThis is a micro-benchmark so its' results need to be noted with caution, but again, this is several orders of magnitude greater than the current mnb load of 2500-3000 inbound shreds per second.
Testing / Data
I've been running two nodes in an A/B setup; one node with the change and one without. The nodes were restarted with this change around 18:00 UTC on 2024/04/22. We'll examine
shred_sigverify
stats, as this is where the receivers all send their packets and where some of the gains can be observed.For starters, we see similar values for
num_packets
, ignoring the restart spike around 18:00. This is not surprising / a basic sanity check:However,
num_batches
is significantly smaller for the node running this branch. To simplify, this graphs showsnum_packets
/num_batches
to yield packets-per-batch:More packets per batches means better locality, which we can see with a ~20% drop in total time spent via
elapsed_micros
:And less total time with the same number of packets also means a similar ~20% drop in average spent time per packet